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Abstract 

In this report we study the problem of determining three-dimensional ori- 
entations for noisy projections of randomly oriented identical particles. The 
problem is of central importance in the tomographic reconstruction of the den- 
sity map of macromolecular complexes from electron microscope images and 
it has been studied intensively for more than 30 years. 

We analyze the computational complexity of the orientation problem and show 
that while several variants of the problem are iVP-hard, inapproximable and 
fixed-parameter intractable, some restrictions are polynomial-time approx- 
imable within a constant factor or even solvable in logarithmic space. The 
orientation search problem is formalized as a constrained line arrangement 
problem that is of independent interest. The negative complexity results give 
a partial justification for the heuristic methods used in orientation search, and 
the positive complexity results on the orientation search have some positive 
implications also to the problem of finding functionally analogous genes. 

A preliminary version "The Computational Complexity of Orientation Search 
in Cryo-Electron Microscopy" appeared in Proc. ICCS 2004, LNCS 3036, pp. 
231-238. Springer- Verlag 2004. 
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1 Introduction 

Structural biology studies how biological systems are built. Especially, deter- 
mining three-dimensional electron density maps of macromolecular complexes, 
such as proteins or viruses, is one of the most important tasks in structural 
biology P^j . 

Standard techniques to obtain three-dimensional density maps of such particles 
(at atomic resolution) are by X-ray diffraction (crystallography) and nuclear 
magnetic resonance (NMR) studies. However, X-ray diffraction requires that 
the particles can form three-dimensional crystals and the applicability of NMR 
is limited to relatively small particles |H]. For example, there are many well- 
known viruses that do not seem to crystallize and are too large for NMR 
techniques. (To the best of our knowledge NMR techniques can be currently 
applied only up to size of 1 MDa J^ while viruses are typically at least ten 
times larger.) 

A more flexible way to reconstruct density maps is offered by cryo-electron 
microscopy pUl I15j . Currently the resolution of the cryo-electron microscopy 
reconstruction is not quite as high as resolutions obtainable by crystallography 
or NMR but it is improving steadily. 

Reconstruction of density maps by cryo-electron microscopy consists of the 
following subtasks: 

Specimen preparation. A thin layer of water containing a large number 
of identical particles of interest is rapidly plunged into liquid ethane to 
freeze the specimen very quickly. Quick cooling prevents water from 
forming regular structures J3]. Moreover, the particles get frozen in 
random orientations in the iced specimen. 

Electron microscopy. The electron microscope produces an image repre- 
senting a two-dimensional projection of the iced specimen. This image is 
called a micrograph. Unfortunately the electron beam of the microscope 
rapidly destroys the specimen so getting accurate images from it is not 
possible. 

Particle picking. Individual projections of particles are extracted from the 
micrograph. There are efficient methods to do that, see e.g. [211 ESI- The 
number of projections obtained may be thousands or even more. 

Orientation search. The orientations (i.e., the projection directions for each 
extracted particle) for the projections are determined. There are a few 
heuristic approaches for finding the orientations. For further details, see 
Section |21 

Reconstruction. If the orientations for the projections are known then quite 
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standard tomography techniques can be apphed to construct the three- 
dimensional electron density map from the projections \1^. 

For a more broader view to the reconstruction process, see Figure ^ 

In this report we study the computational complexity of the orientation search 
problem which is currently the major bottleneck in the reconstruction process. 
On one hand we show that several variants of the task are computationally 
very difficult. This justifies (to some extent) the heuristic approaches used in 
practice. On the other hand we give exact and approximate polynomial-time 
algorithms for some special cases of the task that are applicable e.g. to the 
seemingly different task of finding functionally analogous genes |17j . 

The rest of this report is organized as follows. In Section |21 the orientation 
search problem is described. Section El analyzes the computational complexity 
and approximability of the orientation search problem. As an abstract for- 
mulation of the search problem we use certain constrained line arrangement 
problems that are of independent interest. The report is concluded in Sec- 
tion El 

2 The Orientation Search Problem 

A density map is a mapping D : M'^ ^ M with a compact support. An orien- 
tation o is a rotation of the three-dimensional space and it can be described 
e.g. by a three-dimensional rotation matrix. 

A projection p of a three-dimensional density map D to orientation a is the 
integral 

P{x,y)= D (Ro[x,y,zfjdz 

where Rq is a three-dimensional rotation matrix, i.e., the mass oi D is projected 
on a plane passing through the origin and determined by the orientation a. 

Projections of physical densities can be produced e.g. by X-rays or electron 
microscopy. In practice, the density maps are usually represented as three- 
dimensional regular grids of finite-precision numbers depending on the accu- 
racy of the scanning device but in this report we do not need to consider the 
actual representations of projections or density maps. 

Based on the above definitions, the orientation search task is, given projections 
Pi, . . . ,Pn of the same underlying but unknown density map D to find good 
orientations oi, . . . , o^ for them. There are several heuristic definitions of what 
are the good orientations for the projections. 

One possibility is to choose those orientations that determine a good density 
map although it might not be obvious what a good density map is nor how 
it should be constructed from oriented projections. A standard solution is to 



2 THE ORIENTATION SEARCH PROBLEM 



^^F^ 



t^;^iB 



' Other data (EM, X-ray, 
'i SAXS, genet., biochem, ...) 






Simulated data 
Raw EM-data 





*:* 



Digitized micrographs 



(Image processing, CTF, ...) 



Particle picking 



Orientation search Model-based search 



Density map calculation 




; MODEL ', 

\ " - ■ 

\ 
\ 

Model validation 



Other Models 
(X-ray, EM, ...) 



More EM-data 



Model refinement 



Model usage 



rw^ .1 — ,=!R 


PH 


^m 


'mii^ 



Visualization 



Fitting (X-ray, ...) 



Databases 



Pattern matching 
«^ Further analysis (Data Mining, ...) / 



Figure 1: The reconstruction process. 
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Figure 2: Two projections of density D. 



compare how well the given projections fit to the projections of the recon- 
structed density map. This kind of definition of good orientations suggests 
an Expectation Maximization-type procedure of repeatedly finding the best 
model for fixed orientations and the best orientations for a fixed model, see 
e.g. jSl im 1201 1221 HO] • Due to the strong dependency on the reconstruction 
method, it is not easy to say analytically much (even whether it converges) 
about this approach in general. In practice, this approach to orientation search 
works successfully if there is an approximate density map of the particle avail- 
able to be used as an initial model. 

The orientations can be determined also by common lines [2]: Let pi and pj 
be projections of a density map D onto planes corresponding to orientations Oi 
and Oj, respectively; see Figure |21 AH one-dimensional projections of D onto a 
line passing through the origin in the plane corresponding to the orientation Oj 
(oj) can be computed from the projection pi (pj); this collection of projections 
oi Pi (pj) is also called the sinogram oi Pi (pj)- As the two planes intersect, 
there is a line for which the projections of pi and pj agree. This line (which 
actually is a vector since the one dimensional projections are oriented, too) is 
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called the common line oi pi and pj] Figure El 

If the projections are noiseless then already the pairwise common lines of 
three projections determine the relative orientations of the projections in three- 
dimensional space uniquely (except for the handedness) provided that the pos- 
sible symmetries of the particle are taken into account. Furthermore, this can 
be computed by only few arithmetic and trigonometric operations f2^. 

However, the projections produced by the electron microscope are extremely 
noisy and so it is highly unlikely that two projections have one- dimensional 
projections that are equal. In this case it would be natural to try to find the 
best possible approximate common lines, i.e., a pair of approximately equal 
rows from the sinograms for the two projections. Several heuristics for the 
problem have been proposed UlElinilllllliniEniEZlEHlEni. However, they 
usually assume that the density map under reconstruction is highly symmetric 
which radically improves the signal-to-noise ratio. In Section El we partially 
justify the use of heuristics by showing that many variants of the orientation 
search problem are computationally very difficult. 

3 The Complexity of Orientation Search 

In this section we show that finding good orientations using common lines is 
computationally very difficult in general but it has some efficiently solvable 
special cases. The results are described in three phases: First, we consider 
the decision versions of the orientation search problem. Second, we study 
the approximability of several optimization variants. Finally, we examine the 
parameterized (in)tractability of the problem. 

We would like to point out that some of the results are partially similar to 
the results of Hallett and Lagergren fTf] for their problem Core-Clique that 
models the problem of finding functionally analogous genes. However, our 
problem of finding good orientations based on common lines differs from the 
problem of finding functionally analogous genes, e.g., by its geometric nature 
and by its very different application domain. Furthermore, we provide relevant 
positive results for finding functionally analogous genes: we describe an ap- 
proximation algorithm with guaranteed approximation ratio of 2/5 (1 — o(l)), 
if the distances between genes adhere to the triangle inequality within a factor 
(3. 

3.1 Decision Complexity 

As mentioned in Section |21 the pairwise common lines cannot be detected 
reliably when the projections are very noisy. A natural relaxation is to allow 
several common line candidates for each pair of projections. In this section 
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Figure 3: Two projection directions presented as great circles and their com- 
mon line k specified with the rotation angles ai and aj in the internal coordi- 
nate systems of the two circles. 
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we study the problem of deciding whether there exist common hnes in given 
sets of pairwise common hnes that determine consistent orientations. We show 
that some formulations are iVP-complete in general but there are nontrivial 
special cases that are solvable in nondeterministic logarithmic space. (For 
further information about computational complexity and complexity classes 
of decision problems, see e.g. [22] •) 

The common lines-based orientation search problem can be modeled at a high 
level as the problem of finding an ra-clique from an n, m-partite graph G = 
(Vi, . . . , Vn, E), i.e., a graph consisting independent sets Vi, . . . , V^ of size m. 

Problem 1 (n-clique in an n, m-partite graph) Given an n, m-partite graph 
G = {Vi, . . . , Vn, E), decide whether there is an n-clique in G. 

Problem n can be interpreted as the orientation search problem in the following 
way: each group Vi describes the possible orientations of the projection pi and 
each edge connecting two oriented projections says that the projections in the 
corresponding orientations are consistent with each other. 

On one hand already three different orientations for each projection can make 
the problem iVP-complete: 

Theorem 1 Problem^is NP-complete if m > 3. 

Proof. Clearly, the problem is in NP since one can check in polynomial time 
in IGI whether a given subset of the vertices of G forms an n-clique. 

We show the A^P-hardness of Problem ^ by reduction from the graph k- 
color ability problem: 

Problem 2 (graph fc-colorability [26J) Given a graph G = (y,E) and a 

positive integer k, decide whether G is k-colorable, i.e., whether there is a 
mapping f : V ^ {1, . . . ,k} such that if {u, v} E E then f {u) ^ f (v). 

Let G' = {V, E') be the graph that we would like to color with k colors. The 
polynomial-time reduction to a corresponding instance G = {Vi, . . . , V^, E) of 
Problem Q is as follows. For each vertex i E V there is a group Vi consisting 
of k vertices vj, . . . ,ff. Each vertex in Vi corresponds to one coloring of the 
vertex i G V. There is an edge {vi, Vj} & E,Vi & Vi, Vj E Vj,i j^ j, if and only 
if {i,j} ^ E' or Vi and Vj are of different color. 

Clearly, the graph G' = {V, E') is /c-colorable if and only if there is an n-clique 
in the corresponding n, /c-partite graph G = (Vi, . . . , Vn, E). The members of 
groups Vi that correspond to a coloring form an n-clique in G. D 

On the other hand the problem can be solved in nondeterministic logarithmic 
space if the number of orientations for each projection is at most two: 
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Theorem 2 ProblemUlis NL-complete if m < 2. 

Proof. The problem is in NL since it can be reduced in logarithmic space to 
the ?Ti-satisfiability problem with m < 2 that is an A^L-complete problem: 

Problem 3 (?7i-satisfiability [26]) Given a set U of boolean variables and a 
setC of clauses c & C, \c\ < m, decide whether there is a truth value assignment 
/ : f/ — > {0, 1} that satisfies all clauses in C, i.e., whether there is a truth value 
assignment f that sets at least one literal^ true in each clause of C. 

Note first that any instance of the problem with m < 2 can be trivially reduced 
to the case with m = 2. The reduction from Problem ^ with m = 2 to 
Problem IHl with m = 2 is as follows. Let the instance of Problem Q be G = 
(Vi, . . . , Vn, E) and the instance of Problem IHl (f/, C). For each group Vi = 
{vf,v}} there is a boolean variable Ui whose truth value assignments Ui = 
and Ui = 1 correspond to vertices f° and vj, respectively. The set C contains 
a clause Mj = (1 — a)^ V Uj = (1 — b)^ if and only if {v"-, w^} ^ E. 

If there is a truth assignment / satisfying all clauses in C then the vertices 
corresponding to the truth value assignments form an n-clique V in G: Assume 
contrary that the truth value assignment / satisfies all clauses in G but the 
corresponding set V of n vertices does not form an n-clique. Then there are at 
least two vertices v^ and v'j in V such that {vf, v'j} ^ E. But then G contains 
a clause Ui = (1 — a)^ V uj = (1 — 6)^ which the truth value assignment / 
does not satisfy. If no truth value assignment / satisfies all clauses in G then 
in any set V of n vertices there are at least two vertices f " and f ^ such that 

Thus, the graph G contains an ra-clique if and only if there if a truth value 
assignment / that satisfies all clauses in G. 

The problem is also A^L-hard since Problem El with m = 2 can be reduced to 
it in logarithmic time in a similar way. D 

The formulation of the orientation search problem as Problem ^ seems to 
miss some of the geometric nature of the problem. As a first step toward 
the final formulation, let us consider the problem of finding a constrained line 
arrangement, the constraint being that any two lines of the arrangement are 
allowed to intersect only at a given set of points, each such set being of size 
</: 

Problem 4 (/-constrained line arrangement) Given sets Pij C M?, \Pij\ < 

1,1 < i < j < n, decide whether there exist lines Li, . . . , L„ m M^ such that Li 
and Lj intersect only at some p G Pij for all 1 < i < j < n. 

^Recall that literals are just boolean formulas of type a; = and a; = 1 where a; is a 
variable. 
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This problem has some interest of its own since hne arrangements are one of 
the central concepts in computational and discrete geometry jTSl Ell- ^^ ^^ 
require that the lines are in general position, i.e., that they are not parallel 
nor they intersect in same points, then we get the following hardness result: 

Theorem 3 Problem^is NP-complete if I > 9. 

Proof. The problem is in NP for all/ > since it can be checked in polynomial 
time whether there are lines Li, . . . , L„ such that Li and Lj intersect at pij for 
each 1 < i < j < n. 

The A^P-hardness of the problem can be shown by a polynomial-time reduc- 
tion from Problem ^ as follows. Let G = {Vi, . . . , Vn, E) be the instance of 
Problem For each vertex Vi^a G V^ we have a line Lj^. Set P^j contains the 
intersection point of lines Lj^ and Lj^i, if and only if {fi,a,^j,6} ^ E. We can 
use this reduction if we are able to find nm lines on plane in general position 
(for discussion on what being in general position means, see [21] )• Actually, it 
is sufficient to require that 

1. no two lines are parallel, 

2. no three lines intersect in the same point, and 

3. if Pij^ G Pjj^, Pij^ G Pjj2 ^^'i Pijz ^ -Pjjs ^^6 o^ same line then this line is 
one of the lines Lj q. 

Non-vertical lines y = gx + h can be mapped to points {g, h) G M^ and vice 
versa. The nm lines can be generated by considering the pairs {g,h) G N^ 
of positive integers in lexicographical order -<: {gi,hi) -< ((72, /i2) if and only 
if fifi < (72 V {gi = g2 /\hi < /i2); and choosing some points ((7, h) according to 
rules that are equivalent to the above rules for lines. The rules for choosing 
the points are: 

1. each chosen point has a unique first coordinate g\ we call g the column 
index of the point, 

2. no line passes through three chosen points, and 

3. three lines, each passing through two chosen points, can intersect in the 
same point only if that point is chosen, too. 

We still have to show that it is sufficient to consider only a polynomial number 
of points in N^ in order to find nm points that satisfy the given requirements. 
Let the number of chosen points at certain stage of the construction to be k 
with one point chosen from each column 0, . . . , /c — L Then the maximum 
number of the points we have to consider at column k before finding a feasible 
point can be bounded above polynomially in n and m as follows: 
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• Exactly (2) lines can be drawn passing through at least two chosen 
points. These lines make at most (2) points in the column k infeasible. 

• Two points on a plane span a line uniquely. Any four chosen points span 
two different lines L[ and L'- and there are exactly ( ^ ) such pairs of lines. 
Each of the other k — A chosen points can span at most one line with the 
points in the column k that passes through the intersection point of the 
lines L^ and L'. Thus, the number of points in the column k that are 
infeasible due to this is at most {k — A) (^). 

Thus, the number of points in column k that have to be considered before 
finding the first point that does not violate our selection rules and hence can 
be chosen as the k + l:st point is at most 

which is clearly polynomial in nm when k < nm. D 

The result can be slightly improved if we relax the general position requirement 
used in Theorem IHl e.g., if we allow also parallel lines in the arrangement: 

Theorem 4 Problem^is NP-complete ifl>Q. 

Proof. The problem is in NP as noted in the proof of Theorem |S1 

The A^P-hardness of the problem can be shown by reduction from Problem|3]as 
follows. Given an instance (C, U) of the ?7i- satisfiability problem, we construct 
point sets Pij for 1 < i < \U\ and 1 < j < \C\. This is done by representing 
the variables and clauses by suitable line arrangements and constraining their 
intersection points. Each boolean variable Mj G t/ is represented by two vertical 
lines L° and Lj representing the truth value assignments Mj = and Ui = 
1, respectively. Each clause Cj G C is represented by \cj\ horizontal lines 
Lj^i, . . . ,Lj^\c-\- The intersection point of lines Li^a and Lj^b corresponding to 
the truth value assignment Ui = a and the 6th literal in the clause Cj is in Pij 
if and only if the truth value assignment Ui = a does not falsify the 6th literal 
in Cj which fixes sets P^j. These lines are placed on plane in such way that all 
vertical lines have different horizontal coordinates and all horizontal lines have 
different vertical coordinates. 

Without loss of generality, we assume that \U\> m and \C\ > 2. This ensures 
that all lines that are spanned by the points in sets Pij and correspond to the 
clauses must be horizontal and all lines that correspond to the variables must 
be vertical in any feasible line arrangement corresponding to a satisfying truth 
assignment. 

If there is a satisfying truth assignment / for the set C of clauses then the 
lines of the corresponding line arrangement intersect in the allowed points 
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that belong to the sets Pij. If there are hnes intersecting only at the allowed 
points then the vertical lines uniquely determine a truth value assignment / 
that satisfies all clauses in C. 

Thus, the lines can be arranged on plane in such way that they intersect only 
at allowed intersection points in sets Pij if and only if the there is a truth value 
assignment satisfying all clauses in C. Furthermore, if the size of the largest 
clause is m then the size of the largest set Pij is at most 2m = /. As Problem El 
is A^P-complete when m > 3, Problem 0] is A^P-complete when / > 6. D 

However, the orientation search is not about arranging lines on the plane but 
great circles on the (unit) sphere S = {{x,y, z) E M.^ : x"^ + y"^ + z'^ = 1} as the 
orientations and the great circles are obviously in one-to-one correspondence. 
Thus, we should study the great circle arrangements: 

Problem 5 (/-constrained great circle arrangement) Given sets Pij C 
5*+ = {{x,y, z) E S : z > 0} , \Pij\ < 1,1 < i < j < n, decide whether there 
exist great circles Ci, . . . , C„ on 5* such that Ci and Cj intersect on 5+ only at 
some p G Pij for all 1 < i < j < n. 

It can be shown that the line arrangements and great circle arrangements are 
equivalent through the stereographic projection [T^ : 

Theorem 5 Problem{^is as difficult as Problem^ 

Proof. Great circles on a sphere can be mapped to lines on a plane by 
the central projection and lines on a plane to great circles on a sphere by its 
inverse [?]. D 

Still, our problem formulation is lacking some of the important ingredients of 
the orientation search problem: it is not possible to express at this stage of the 
orientation search the common line candidates by giving the allowed pairwise 
intersection points on the sphere S, i.e., in some globally fixed coordinate sys- 
tem. Rather, one can represent a common line only in the internal coordinates 
of the two great circles that correspond to the two projections intersecting. 
Each coordinate is in fact an angle giving the rotation angle of the common 
line on the projection as depicted in Figure El Hence the representation is a 
pair of angles: 

Problem 6 (locally /-constrained great circle arrangement on sphere) 

Given sets Pij C [0, 2tt) x [0, 2tt), \Pij\ < 1,1 < i < j < n, decide whether there 
exist great circles Ci, . . . ,Cn on S such that Ci and Cj intersect only at some 
p G Pij for all 1 < i < j < n, where p defines the angles of the common line 
on Ci and Cj . 

Also this problem can be shown to be equally difficult to decide: 
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Theorem 6 Problem\^is NP-complete if I > 6. 

Proof. The problem is in NP since it is possible to check in polynomial time 
in the total number of possible local intersection points whether a given set of 
local intersection points is realizable. 

The iVP-hardness of the problem can be obtained from the proofs of Theorem^] 
and TheoremEl Indeed, all great circles corresponding to the horizontal lines in 
the corresponding line arrangement are forced to be parallel by their common 
intersection point. Similarly, all great circles corresponding to the vertical lines 
are forced to be parallel by their common intersection point. D 

Thus, deciding whether there exist consistent orientations seems to be difficult 
in general. 

3.2 Approximability 

As finding a consistent orientation for the projections is by the results of Sec- 
tion 13.11 difficult, we should consider also orientations that may determine 
orientations only for a large subset of the projections or resort to common 
lines that are as good as possible. 

A simple approach to consider consistent orientations for large subsets of 
the projections is to look for large cliques in the n, -m-partite graph G = 
{Vi, . . . , Vn, E) instead of exactly n-cliques. In the world of orientations this 
means that instead of finding consistent orientations for all projections we look 
for consistent orientations for as many projections as we are able to and neglect 
the other projections. 

Containing a clique is just one example of a property a graph can have. Also 
other graph properties might be useful. Thus we can formulate the problem 
in a rather general form as follows: 

Problem 7 (Maximum subgraph with property P in an n, ?7i-partite graph) 

Given an n, m-partite graph G = {Vi, . . . , Vn, E), find the largest V C V^^U. . .U 
Vn such that the induced subgraph satisfies the property P and \V' r\Vi\ < 1 
for all 1 < i < n. 

This resembles the following fundamental graph problem in combinatorial op- 
timization and approximation algorithms: 

Problem 8 (Maximum subgraph with property -P PJ) Given a graph G = 
(y,E), find the largest V' C V such that the induced subgraph satisfies the 
property P. 

It is not very difficult to see that the two problems are equivalent: 
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Theorem 7 Problem^is as difficult as Proble'm\^ 

Proof. On the one hand, Problem [3 is a special case of Problem |H1 with a 
restricted graph structure and with the additional condition |y fl Vj| < 1 for 
all i which can be included in the property P. On the other hand, Problem |H1 
is a special case of Problem [7| with singleton groups Vi, . . . , V\c\- □ 

Problem IHl is very difficult w.r.t. several properties ^. By Theorem [3 these 
results generalize to Problem Hence, for example, finding the maximum 
clique from the n, m-partite graph cannot be approximated within ratio n^~'^ 
for any fixed e > JH]- Note that the approximation ratio n can be achieved 
trivially by choosing any of the vertices in G which is always a clique of size 1. 

In practice the techniques for finding common lines or common line candidates 
actually evaluate all potential common lines of two projections (that is, all 
relative orientations of the two projections with respect to each other are in 
effect considered) and give them a score which typically is the distance between 
the two sinogram rows corresponding to potential common line. Thus, we 
could assume that there is always at least one feasible solution and study the 
following problem: 

Problem 9 (Minimum weight n-clique in a complete n, ?7i-partite graph) 

Given a complete n, m-partite graph G = {Vi, . . . , Vn, E) and a weight function 
w : E ^ N, find V C ViU . . .Vn such that the weight J2u vgv u=/=v '^({'^^ ^}) ^■^ 
minimized and \V' r\Vi\<l for all 1 < i < n. 

Unfortunately, it turns out that in this case the situation is extremely bad: 

Theorem 8 Problem\^with m > 3 is not polynomial-time approximable within 
2"' for any fixed k>OifP^NP. 

Proof. If Problem IHl were approximable within 2^ for some fixed k > then 
the iVP-complete Problem ^ could be solved in polynomial time by using the 
following weight function for the edges: 

2"' if e e ^ and 

1 otherwise. 



Thus, the problem is not approximable within 2" in polynomial time provided 
that P ^ NP. D 

When there are only two vertices in each group the problem admits a constant 
factor approximation ratio but no better: 

Theorem 9 Problem\^is APX-complete if m = 2. 
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Proof. This can be shown by an approximation-preserving reduction from 
and to the minimum weight 2-satisfiabihty problem that is known to be APX- 
complete: 

Problem 10 (Minimum weight 2-satisfiability [T]) Given a set U of boolean 
variables, a set C of clauses c G C, |c| < 2 and a weight function w : C ^ N, 
find the truth value assignment /:[/—> {0, 1} that minimizes the sum of the 
weights of unsatisfied clauses, i.e., 

^ w{c). 

u does not satisfy c&c 

The reduction from Problem ^1 to Problem IHl with m = 2 is very similar to 
the reduction in the proof of Theorem El Each each boolean variable Ui is rep- 
resented by a two-set Vi = {v'^,v}} corresponding to truth value assignments 
Ui = and Mj = 1, respectively. By definition of Problem El the graph G = 
{Vi, . . . , Vn, E) is complete, i.e., E = {{u, v} : u & Vi, v ^ Vj , 1 < i < j < n}. 
The weight of the edge e = {f ", f^} G -E is zero if there is a clause Ui = 
(1 — aY V Uj = (1 — 6)^ in C and U7(e) otherwise. 

Thus, the weight of the n-clique V in G equals to the weight of the clauses that 
are not satisfied by the truth value assignment corresponding to the n-clique 
determined by V. That is. Problem JHl with ?7i = 2 is at least as difficult as 
Problem Cni 

Problem ini with m = 2 can be reduced in polynomial time to Problem 1101 in a 
similar way. For each vertex set Vi in G there is a boolean variable Ui and the 
vertices Vi,v} G Vi correspond to the two truth value assignments of Ui. For 
each edge e = {w", v^} in E there is a clause m, = (1 — a)^ V Uj = (1 — &)^ with 
weight w{e). 

The weight of the clauses that the truth value assignment / does not satisfy 
is equal to the weight of the corresponding n-clique in G. That is. Problem El 
with 771 = 2 is at most as difficult as Problem 1101 

Thus, Problem El with m = 2 is APX-complete, as claimed. D 

An easier variant of Problem El is the case where the edge weights admit the 
triangle inequality within a factor /3, i.e., for all edges {t,u}, {t,v} and {u,v} 
in E it holds 

w{{t, u}) < P {w{{t, v}) + w{{u, v})). 

A good approximation of the minimum weight n-clique in G can be found by 
finding the minimum weight n-star that contains one vertex from each group 
Vi. The method is described by Algorithm ^ 

Algorithm n gives constant-factor approximation guarantees and the approxi- 
mation is stable (for details on approximation stability, see [HI): 
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Algorithm 1 A constant-factor approximation algorithm for finding the min- 
imum weight n-clique from a weighted graph. 



1 


function Minimum-Weight-Star(G', w) 


2 


Wmin ^ OO 


3 


for i = 1, . . . ,n do 


4 


for all V eVi do 


5 


W ^0 


6 


for j = 1, . . . , i — 1, i + 1, . . . , n do 


7 


W '^W + min^gv^, {w{{u, v})} 


8 


end for 


9 


i{W< W^in then 


10 


•'^ mm ^ '^ 


11 


f min ^ f 


12 


end if 


13 


end for 


14 


end for 


15 


V ^^ 


16 


for J = 1, . . . ,n do 


17 


V ^V'U {argmin„evs, {w{{u,v,nm})}} 


18 


end for 


19 


return {V, E' = {e E E : e C V'}) 


20 


end function 



Theorem 10 Prohlem\^is polynomial-time approximable within 2(3 (1 — o(l)) 
hy Algorithm{J\ if the edge weights satisfy the triangle inequality withm factor 

p. 

Proof. Let G' = {V, E') be the n-clique found from the n, m-partite complete 
graph G by Algorithm ^ and let OPT (G) be the minimum weight n-clique in 
G. 

The weight of G' can be bounded above as follows. We distribute the weight 
of the solution G' to its vertices: 



^)= E «^(^)/2- 



W[V} 

e£E'v£e 



The weight of the lightest vertex in G', the vertex Vmm, is 
For each edge {u,v} G £" such that Umin ^ {m, w}, holds 

W^({m, v}) < (3 [w{{u, t;mm}) + w{{v, Vmin})] 
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by the assumption. Thus, the sum of the weights of the other vertices in V 

^ >r^ >r^ P [w{{u, t;min}) + w{{v, Vmm})] 

= {n - 1) 2f3w{v^,^) = ^^ (^ ~ ^^ OPT {G) 

n 

- 213 (l - -] OPT (G) . 

Combining these two upper bounds we get 

w{G') < -OPT (G) +2/3(1- - 1 OPT (G) = 2/3 (1 - o(l)) OPT (G) . 
n \ nj 

Thus, Algorithm ^ guarantees the approximation factor 2/? (1 — o(l)) when w 
satisfies the triangle inequality within a factor (3. D 

This algorithm might not be applicable in orientation search as there seems to 
be little hope of finding distance functions (used in selecting the best common 
lines) satisfying even the relaxed triangle inequality for the noisy projections. 
However, in the case of finding functionally analogous genes this is possible 
since many distance functions between sequences are metric. Thus, the algo- 
rithm seems to be very promising for that task. 

A very natural relaxation of the original problem is to allow small changes to 
common line candidates to make the orientations consistent: 

Problem 11 (Minimum error /-constrained line arrangement) Given sets 
Pij C M^, \Pij\ < /, 1 < "^ < J < ''^j find lines Li, . . . ,Ln in M."^ that minimize 
the sum of distances miup-^gp.^ \pij — Pij\'^ where pij is the actual intersection 
point of lines Li and Lj and q > 0. 

Unfortunately also this variant of the problem is very difficult: 

Theorem 11 Problem ^J with I > 6 is not polynomial-time approximable 
within 2" for any fixed k > if P ^ NP. 

Proof. If Problem ^2 would polynomial-time approximable within 2" for 
some fixed k > then Problem E] could be solved in polynomial time since the 
there are lines intersecting at the allowed points if and only if the minimum 
error line arrangement has error zero. D 
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3.3 Parameterized Complexity 

Even if the problem is A^P-hard, it might be solvable in practice if the NP- 
hardness is caused by some properties of the inputs that do not occur in 
practice. For example, one might be interested only vertex covers of size at 
most k. Deciding whether there is vertex cover of size at most A; in a graph 
of n vertices can be solved in time O (n^Y However, if k is, e.g., 40 and n is 
very large then this time complexity is unacceptable. Instead, we would like 
to have time complexity of form O {n'^) for some reasonably small c. 

Formally a parameterized decision problem is a set -D C S* x N where S 
is a finite alphabet. A parameterized decision problem D is fixed-parameter 
tractable if for each (x, fc) G S* x N it can be decided whether (x, k) is in D 
in time / {k) \x\^^^'^ where / : N ^ N is an arbitrary function. Parameterized 
complexity classes form a hierarchy similar to the polynomial hierarchy: 

FPT <^W[l\(ZW[l\(Z...(ZW [SAT] (^W[P]. 

All inclusions between the classes are believed to be proper. All problems 
outside the class FPT are called fixed-parameter intractable. 

A parameterized problem D reduces to a parameterized problem D' if there 
exist functions /, (? : N — i> N and h : D ^ D' such that h (x, k) is computable 
in time f {k) |xp*^^-' for each instance in S x N, and (x. A;) G -D if and only 
if {h{x),g{k)) G D'. Such a reduction is called a standard parameterized 
m-reduction. (For further details on parameterized complexity, see |12j.) 

For the orientation search problem there is a natural parameterization: the 
number of projections can be bounded by a constant. Thus, Problem [T] can be 
turned into the following parameterized problem: 

Problem 12 (A;-clique in fc, ?7?,-partite graph) Given a k,m-partite graph 
G = [Vi, . . . , Vk, E) and a natural number k, decide whether there is a k-clique 
in G. 

The intuition behind this formulation of being interesting is that if we would 
be able to orient a few representative projections very well then the risk that 
the orientations found for the other projections based on those well-oriented 
representative projections would be incorrect could be small enough. Thus, 
there would be good chances to reconstruct an accurate density map based on 
the found orientations. Unfortunately, also this formulation is fixed-parameter 
intractable: 

Theorem 12 Problem\l^is W[l]- complete. 

Proof. Let us first show that the A;, m-satisfiability is iy[l]-hard: 
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Problem 13 (k, m-satisfiability) Given a set U of boolean variables and a 
set C, \C\ = k, of clauses c & C, \c\ < m, decide whether there is a truth value 
assignment / : ?7 — > {0, 1} that satisfies all clauses in C . 

Lemma 1 Problem\lSi is W[l\- complete. 

Proof. The problem is shown to be iy[l]-hard by a parameterized reduction 
from the short nondeterministic Turing machine computation problem which 
is known to be iy[l]-complete. 

Problem 14 (Short nondeterministic Turing machine computation |12| ) 

Given a nondeterministic Turing machine M , input string x and a natural 
number k, decide whether there is a computation of M that accepts the string 
X in at most k steps. 



It can be verified that the reduction used in Cook's Theorem (see e.g. 

is a parameterized reduction. Thus, it can be used also here to show that 

Problem IT^ reduces to Problem IT^ 

Problem IT^ can be shown to be in W[l\ by reduction to Problem IT^ D 

Problem IT2l is Vr[l]-hard by a reduction from Problem IT3l as follows. For each 
clause Ci & C there is a group Vi consisting of vertices corresponding to the 
literals in Cj. There is an edge between Vi^a € Vi and Vj^h £ Vj if and only if 
i ^ i and the corresponding literals can be satisfied simultaneously. 

Problem IT^ can be shown to be in W[l\ by a reduction to Problem 1141 D 



4 Conclusions 

We have shown that some approaches for determining orientations for noisy 
projections of identical particles are computationally very difficult, namely 
A^P-complete, inapproximable and fixed-parameter intractable. These results 
justify (to some extent) the heuristic approaches widely used in practice. 

On the bright side, we have been able to detect some polynomial-time solv- 
able special cases. Also, we have described an approximation algorithm that 
achieves the approximation ratio 2/5 (1 — o(l)) if the instance admits the tri- 
angle inequality within a factor (3. It has promising applications in search for 
functionally analogous genes. 

As a future work we wish to study the usability of current state of art in heuris- 
tic search to find reasonable orientations in practice. This is very challenging 
due to the enormous size of the search space. Another goal is to analyze 
the complexity of other approaches for determining the orientations for the 
projections. 
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